Regulating Orthography-Phonology Relationship for English to Thai Transliteration
نویسندگان
چکیده
In this paper, we discuss our endeavors for the Named Entities Workshop (NEWS) 2016 transliteration shared task, where we focus on English to Thai transliteration. The alignment between Thai orthography and phonology is not always monotonous, but few transliteration systems take this into account. In our proposed system, we exploit phonological knowledge to resolve problematic instances where the monotonous alignment assumption breaks down. We achieve a 29% relative improvement over the baseline system for the NEWS 2016 transliteration shared task.
منابع مشابه
A learning method for Thai phonetization of English words
This article tackles the problem of transcribing English words using Thai phonological system. The problem exists in Thai, where modern writing often composes of English orthography, and transcribing using English phonology results unnatural. The proposed model is totally data-driven, starting by automatic grapheme-phoneme alignment, modeling transduction rules and predicting Thai syllabictones...
متن کاملCan the first letter advantage be shaped by script-specific characteristics?
We examined whether the first letter advantage that has been reported in the Roman script disappears, or even reverses, depending on the characteristics of the orthography. We chose Thai because it has several "nonaligned" vowels that are written prior to the consonant but phonologically follow it in speech (e.g., แฟน <ε:fn> is spoken as /fɛ:n/) whereas other "aligned" vowels are written and sp...
متن کاملSyllable-Based Thai-English Machine Transliteration
This article describes the first trial on bidirectional Thai-English machine transliteration applied on the NEWS 2010 transliteration corpus. The system relies on segmenting sourcelanguage words into syllable-like units, finding unit's pronunciations, consulting a syllable transliteration table to form target-language word hypotheses, and ranking the hypotheses by using syllable n-gram. The app...
متن کاملA Chunk-based n-gram English to Thai Transliteration
In this study, a chunk-based n-gram model is proposed for English to Thai transliteration. The model is compared with three other models: Table lookup model, decision tree model, and statistical model. The chunk-based ngram model achieves 67% word accuracy, which is higher than the accuracy of other models. Performances of all models are slightly increased when an English grapheme to phoneme is...
متن کاملHindi and Marathi to English NE Transliteration Tool using Phonology and Stress Analysis
During last two decades, most of the named entity (NE) machine transliteration work in India has been carried out by using English as a source language and Indian languages as the target languages using grapheme model with statistical probability approaches and classification tools. It is evident that less amount of work has been carried out for Indian languages to English machine transliteration.
متن کامل